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The recent discovery of introner-like elements (ILEs) in six fungal species shed new light on the origin of regular 
spliceosomal introns (RSIs) and the mechanism of intron gains. These novel spliceosomal introns are found in hundreds 
of copies, are longer than RSIs and harbor stable predicted secondary structures. Yet, they are prone to degeneration 
in sequence and length to become undistinguishable from RSIs, suggesting that ILEs are predecessors of most RSIs. In 
most fungi, other near-identical introns were found duplicated in lower numbers in the same gene or in unrelated genes, 
indicating that intron duplication is a widespread phenomenon. However, ILEs are associated with the majority of intron 
gains, suggesting that the other types of duplication are of minor importance to the overall gains of introns. Our data 
support the hypothesis that ILEs' multiplication corresponds to the main mechanism of intron gain in fungi. 



The Proposed Mechanisms for Intron Gain Cannot 
Explain the High Intron Density in Present Day 
Eukaryotic Genomes 

Eukaryotic genes consist of exons that contain the coding 
sequence, and of introns that are non-coding and are removed 
from premature mRNA after transcription. The spliceosome 
machinery, a large ribonucleoprotein that recognizes specific 
intronic features, catalyzes two consecutive transesterification 
reactions that result in splicing of the nuclear introns and liga- 
tion of adjacent exons. 1 Such a mosaic gene structure is certainly 
one of the most important features that allowed the appearance 
of complex organisms during evolution of higher Eukaryotes. 2 
Indeed, land plants and animals, including humans, have intron- 
rich genomes (> 3 introns per kb coding sequence) as compared 
with more simple organisms such as most fungi (< 3 introns per 
kb coding sequence). 3,4 Yet, more than 30 y after their discovery, 
the origin of spliceosomal introns is still unknown. Analyses of 
gain and loss of introns in diverse eukaryotic lineages kept the 
mystery on introns' origin alive because there was less evidence 
for gains as compared with losses. 4 ' 5 In many Eukaryotes, the 
estimated rates for intron gain and loss cannot explain the high 
intron density in many present-day genomes. Indeed, a higher 
intron loss rate would ultimately result in the disappearance of 
spliceosomal introns. However, some lineages such as fungi have 
experienced more balanced rates of intron gains and losses, 6,7 sug- 
gesting that intron gains can still occur to a large extent in pres- 
ent days. In addition to fungi, 6 " 9 extensive recent intron gains 
have been reported in the micro-crustacean Daphnia pulex} 0 
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Several mechanisms have been proposed for intron gains 
and have been recently reviewed in detail." The model that has 
received most support in the scientific community is referred to 
as intron transposition. It involves reverse splicing of a spliced 
intron into the mRNA of another gene, followed by reverse 
transcription and homologous recombination at the gene locus. 
This model is almost identical to the main mechanism proposed 
for intron loss by reverse transcription and homologous recom- 
bination after intron splicing. 11,12 Observations of intron losses 
occurring more frequently at the 3' end of the genes support this 
mechanism. 6,12,13 However, according to these models, the dif- 
ference in rates of intron gain and loss solely depends on the rate 
of reverse splicing, which is expected to occur at low frequency. 14 
Thus, the balanced rates of intron gain and loss in certain lin- 
eages challenge the intron transposition model. Roy and Irimia 
proposed two new models to resolve this paradox: spliceosomal 
retrohoming (reverse splicing of an intron directly into DNA fol- 
lowed by reverse transcription) and template switching during 
reverse transcription. 14 Other mechanisms have also been sug- 
gested including: (1) recombination between two paralogs, one 
containing an intron and the other one intronless (intron trans- 
fer); (2) insertion of a transposable element followed by conver- 
sion to an intron; (3) intronization of an exon by acquisition of 
splicing sites; (4) mobilisation and propagation of a self-splicing 
group II intron from an organelle into the nucleus; (5) insertion 
during DNA double-strand breaks repair; and finally (6) duplica- 
tion of a genomic segment that contains cryptic splicing sites. 11 
However, only the last mechanism has been experimentally 
proven. 15 All the other models, including intron transposition, 
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Table 1. Identification of multi-copy introns in 24 fungal species 
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For each intron of a given fungal species, a BlastN analysis was per- 
formed using the complete intronome. Then, intron clusters were built 
by grouping a given intron with its near-identical introns. Introns that 
were duplicated only within the same gene were classified as same gene 
duplications (SGD). Near-identical introns found in unrelated genes were 
classified as low-copy introns (LCI) when a search using hidden Markov 
models did not increase the number of members by more than 2-fold; 
they were classified as high-copy introns when this search increased the 
number of members by more than 2-fold. These high-copy introns were 
subsequently named introner-like elements (ILE). 9a Number of introns. 
Contribution of a duplication type to the total number of duplications is 
indicated as percentage in brackets; These high-copy introns were not 
retrieved as ILEs by additional more stringent analyses. 

only rely on indirect evidence and fail to describe how the vast 
majority of introns were gained. 11 It is likely that all proposed 
mechanisms contribute to intron gains to some extent, but the 
frequencies at which they occur cannot explain the high number 
of introns present in numerous Eukaryotes. Therefore, it has been 
suggested that the mechanism of intron gain in ancestral lineages 
might differ from those that occur in modern Eukaryotes. 5 

Intron Duplication is a Widespread Phenomenon 
in Fungi 

A striking observation in the animal Oikopleura dioica 16 and in 
the alga Micromonas pusilla} 1 was the presence of introns that 



are nearly identical at the sequence level. In M. pusilla, these 
near-identical introns are present in thousands of copies and 
were named introner elements (IE). Near-identical introns were 
also reported to occur in the fungus Mycosphaerella gramini- 
cola. s Recently, we reported on the occurrence of near-identical 
introns in five additional fungal species, where they are pres- 
ent in up to five hundred copies. 9 We named these high-copy 
introns introner-like elements (ILE) to refer to IEs found in 
M. pusilla. Like regular spliceosomal introns (RSIs), ILEs have 
typical splicing features including canonical acceptor and donor 
sites, branch point sequence and polypyrimidine tracts, which 
suggest that they can be spliced by the spliceosome machinery. 
However, in addition to being present in many near-identical 
copies, we also found that ILEs have features completely differ- 
ent from RSIs. They are significantly longer and have lower pre- 
dicted Gibbs free energy (AG) values that were ascribed to stable 
predicted secondary structures. A robust gain analysis showed 
that up to 90% of gained introns are ILEs. Because our data 
showed that ILEs quickly degenerate in length and sequence 
to become undistinguishable from RSIs, we hypothesized that 
non-ILE-associated gains are highly degenerated ILEs. Thus, 
most RSIs might originate from ILEs in at least six fungal 
species. 9 

In this study, the very first step of the pipeline that was 
developed to identify ILEs involved a simple BlastN search and 
clustering method, which retrieved three different types of near- 
identical introns. 9 Depending on the number of introns with a 
near-identical sequence and whether they were duplicated within 
the same gene or in different genes, these multi-copy introns 
were classified as same gene duplications (SGD; 82 members), 
low-copy introns (LCI; 302 members) and high-copy introns 
(1226 members) that were subsequently named ILEs. This search 
revealed that intron duplication is a widespread phenomenon in 
fungi because it was found in all species included in the study 
except Aspergillus nidulans (Table 1). However, the contribution 
of each category to the observed duplication events varies. Nine 
species contain only LCIs, while both SGDs and LCIs are found 
in five other species. In the latter, SGDs occur less frequently and 
contribute to 25-54% of the observed duplications (Table 1). 
The remaining six fungal species have all three types of dupli- 
cated introns, but they also have a very high number of ILEs (24 
to 377), which contribute between 60% and 92% to all dupli- 
cation events (Table 1). Noteworthy, Rhytidhysteron rufulum, 
Fusarium graminearum and Sclerotinia sclerotiorum contain near- 
identical introns in high numbers but they correspond to repeti- 
tive elements that inserted within RSIs and were not retrieved as 
ILEs in the subsequent and more stringent steps of ILE identifi- 
cation (Table l). 9 

As was done in our previous study on ILEs, the length and 
stability of the two other types of near-identical introns were 
measured. The median length of SGDs and LCIs are in the same 
range as observed for non-duplicated introns (NDI), but ILEs are 
about twice as long (Fig. 1A). The AG free energy of SGDs and 
LCIs is not different from that of NDIs, while ILEs have a sig- 
nificantly lower AG (Fig. IB). These results suggest that different 
mechanisms might be involved in the duplication of each intron 
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Figure 1. Length and stability of the different types of duplicated introns.The length and predicted Gibbs free energy (AG) were measured for non- 
duplicated intron (NDI), same gene duplications (SGD), low-copy introns (LCI) and introner-like elements (ILE) from 24 fungal species included in this 
study. 9 (A) Median length and interquartile range are plotted for each type of intron. The median length is indicated above the bars. (B) Mean and SD 
of AG values of introns with a length corresponding to the median of each type of intron. A non-parametric Kruskall-Wallis test was performed (p < 
0.0001), followed by a Dunn's pairwise comparison test at a = 0.05 significance level. Only significant differences are indicated. 



Table 2. Single intron gain and loss analysis in fungal species containing ILEs 



Fungal species 


Orthologs 


Introns 


ILEs 


Ancestral 
intron 9 


Single 
gain b 


Single 
loss b 


SGD at gain 
positions' 1 


LCI at gain 
positions' 


ILE at gain 
positions' 


Cladosporium fulvum 
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Mycosphaerella fijiensis 
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Single gains and single losses were determined using only one outgroup clade for each species as described in our previous report. 9 Contribution of 
same gene duplications (SGD), low-copy introns (LCI) and introner-like elements (ILE) to single gains was determined. a lntron position conserved in all 
analyzed fungal species; b lntrons that are present or absent only in the considered species; 'Numbers in brackets are numbers of SGDs, LCIs or ILEs at 
single gain positions divided by the number of single gains. 



type. SGDs are found in only 11 fungal species and are limited 
in number (maximum of 16 members in a given species). Fifty 
percent of these duplication events represent segmental duplica- 
tion within the same gene because exon sequences on each side 
of these introns are also duplicated. The other 50% might rep- 
resent intron transpositions within the same transcript or intron 
transfers between paralogs. Comparable low numbers were also 
reported in Caenorhabditis elegans in which only three gained 
introns are SGDs. 18 In C. neoformans, a single gene with several 
putative SGDs was also shown to be most likely the result of a 
duplication of exonic repeats." The two other types of multi-copy 
introns are found in different unrelated genes, suggesting that 
they may represent the same type of introns, but differ in mul- 
tiplication frequency. They have different characteristics (length 
and AG), which suggests that different duplication mechanisms 
are involved. However, these differences are also consistent with 
ILE degeneration and LCIs might represent degenerated ILEs. 
This hypothesis might explain why we could not identify more 
introns that would have originated from them. Alternatively, 



LCIs could originate from a low frequency transposition mech- 
anism. Altogether, our results suggest that ILEs are prevailing 
duplication events in fungi, explaining on average 76% of intron 
duplications. 

Introner-Like Elements Reconcile the Intron Gain 
Mechanism in Ancestral and Modern Genomes 

Based on the observed degeneration, we speculated that ILEs are 
at the origin of most RSIs in at least six fungal species, which 
implies that they should be associated with intron gains. Indeed, 
ILEs can contribute up to 90% of recent intron gains. 9 An intron 
gain and loss analysis (IGL) in fungal species that contain ILEs 
showed that gains occur on average 10-fold more frequently 
than losses (Table 2). Remarkably, this is also true in Septoria 
musiva, a species that carries highly degenerated ILEs only, 
which initially could not be identified as such.' In the IGL analy- 
sis shown here, up to 50% of the gains are explained by ILEs, 
while almost none are explained by SGDs or LCIs (Table 2). The 
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Figure 2. Birth, life and death of spliceosomal introns in fungi. (A) Gained introns are single gains in Cladosporium fulvum, Dothistroma septosporum, 
Mycosphaerella graminicola, Mycosphaerella fijiensis or Septoria musiva as determined in Table 2. Ancestral introns are conserved among all fungi 
included in this study. Lost introns are single losses in one of the five fungal species. Length of lost introns that are still present in the other four spe- 
cies was calculated and corrected for outliers using the formula: (sum-max-min)/(length-2). A non-parametric Kruskall-Wallis test was performed (p < 
0.0001), followed by a Dunn's pairwise comparison test at a = 0.05 significance level. Only significant differences are indicated. (B) Length distribution 
of non-duplicated introns (NDIs), introner-like elements (ILEs) and lost introns in the five fungal species listed above. 



non-explained gains certainly correspond to more ancient gained 
introns that cannot be recognized as ILEs because of the high 
level of degeneration.'- 1 

Our analysis also revealed that introns absent in other species 
are similar in length to ancestral introns that are conserved in 
all fungal species included in this study, although with a much 
lower standard deviation (Fig. 2A). Our findings suggest that the 
majority of new introns originate from ILEs, which subsequently 
lose their stable secondary structure and shorten toward the opti- 
mal intron length, to eventually be lost (Fig. 2B). Accordingly 
in Aspergillus species, it was found that lost introns are signifi- 
cantly shorter than conserved introns. 7 Our proposed model for 
fungal intron birth, life and death is consistent with the high 
intron dynamics observed in fungi, but also with lower dynam- 
ics in higher Eukaryotes, which is most likely related to the dif- 
ferent generation times. Intron-rich genomes usually have longer 
introns, 3 which would hamper their loss. 

With the resonance of IEs in M. pusilla, it is very likely that 
genome invasion by introns could have occurred at least once in 
an ancestral Eukaryotic lineage to give rise to the present-day 



intron-rich Eukaryotes. This hypothesis suggests that the mecha- 
nisms of intron gains in ancestral and modern genomes are still 
the same. From the results presented above, multiplication of 
ILEs in fungi and IEs in M. pusilla is certainly the main mech- 
anism of intron gain in these species. Because of the high fre- 
quency of duplication events, ILE and IE multiplication likely 
involves a mechanism different from those proposed so far. Yet, 
spliceosomal retrohoming is the model that would comply best 
with our observations, but additional concepts are required in 
this model to take into account ILE specific characteristics. The 
predicted stable secondary structures of ILEs seem to be under 
selection pressure as suggested by the many compensatory muta- 
tions observed in ILEs.' It is tempting to speculate that ILE 
secondary structures might significantly contribute to the multi- 
plication mechanism. We are now setting up experiments to find 
evidence for the mobility of ILEs and deciphering the mechanism 
of their multiplication. 
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